9 research outputs found

    SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

    Full text link
    There has been a growing interest in using deep learning models for processing long surgical videos, in order to automatically detect clinical/operational activities and extract metrics that can enable workflow efficiency tools and applications. However, training such models require vast amounts of labeled data which is costly and not scalable. Recently, self-supervised learning has been explored in computer vision community to reduce the burden of the annotation cost. Masked autoencoders (MAE) got the attention in self-supervised paradigm for Vision Transformers (ViTs) by predicting the randomly masked regions given the visible patches of an image or a video clip, and have shown superior performance on benchmark datasets. However, the application of MAE in surgical data remains unexplored. In this paper, we first investigate whether MAE can learn transferrable representations in surgical video domain. We propose SurgMAE, which is a novel architecture with a masking strategy based on sampling high spatio-temporal tokens for MAE. We provide an empirical study of SurgMAE on two large scale long surgical video datasets, and find that our method outperforms several baselines in low data regime. We conduct extensive ablation studies to show the efficacy of our approach and also demonstrate it's superior performance on UCF-101 to prove it's generalizability in non-surgical datasets as well

    Tracking and Mapping in Medical Computer Vision: A Review

    Full text link
    As computer vision algorithms are becoming more capable, their applications in clinical systems will become more pervasive. These applications include diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and minimally invasive interventions and surgery, automating instrument motion and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing and applying algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. We then review datasets provided in the field and the clinical needs therein. Then, we delve in depth into the algorithmic side, and summarize recent developments, which should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. Finally, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications in the field. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure

    Image and haptic guidance for robot-assisted laparoscopic surgery

    No full text
    Surgical removal of the prostate gland using the da Vinci surgical robot is the state of the art treatment option for organ confined prostate cancer. The da Vinci system provides excellent 3D visualization of the surgical site and improved dexterity, but it lacks haptic force feedback and subsurface tissue visualization. The overall objective of the work done in this thesis is to augment the existing visualization tools of the da Vinci with ones that can identify the prostate boundary, critical structures, and cancerous tissue so that prostate resection can be carried out with minimal damage to the adjacent critical structures, and therefore, with minimal complications. Towards this objective we designed and implemented a real-time image guidance system based on a robotic transrectal ultrasound (R-TRUS) platform that works in tandem with the da Vinci surgical system and tracks its surgical instruments. In addition to ultrasound as an intrinsic imaging modality, the system was first used to bring pre-operative magnetic resonance imaging (MRI) to the operating room by registering the pre-operative MRI to the intraoperative ultrasound and displaying the MRI image at the correct physical location based on the real-time ultrasound image. Second, a method of using the R-TRUS system for tissue palpation is proposed by expanding it to be used in conjunction with a real-time strain imaging technique. Third, another system based on the R-TRUS is described for detecting dominant prostate tumors, based on a combination of features extracted from a novel multi-parametric quantitative ultrasound elastography technique. We tested our systems in an animal study followed by human patient studies involving n = 49 patients undergoing da Vinci prostatectomy. The clinical studies were conducted to evaluate the feasibility of using these systems in real human procedures, and also to improve and optimize our imaging systems using patient data. Finally, a novel force feedback control framework is presented as a solution to the lack of haptic feedback in the current clinically used surgical robots. The framework has been implemented on the da Vinci surgical system using the da Vinci Research Kit controllers and its performance has been evaluated by conducting user studies.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat

    Adaptation of Surgical Activity Recognition Models Across Operating Rooms

    Full text link
    Automatic surgical activity recognition enables more intelligent surgical devices and a more efficient workflow. Integration of such technology in new operating rooms has the potential to improve care delivery to patients and decrease costs. Recent works have achieved a promising performance on surgical activity recognition; however, the lack of generalizability of these models is one of the critical barriers to the wide-scale adoption of this technology. In this work, we study the generalizability of surgical activity recognition models across operating rooms. We propose a new domain adaptation method to improve the performance of the surgical activity recognition model in a new operating room for which we only have unlabeled videos. Our approach generates pseudo labels for unlabeled video clips that it is confident about and trains the model on the augmented version of the clips. We extend our method to a semi-supervised domain adaptation setting where a small portion of the target domain is also labeled. In our experiments, our proposed method consistently outperforms the baselines on a dataset of more than 480 long surgical videos collected from two operating rooms.Comment: MICCAI 202

    SENDD: Sparse Efficient Neural Depth and Deformation for Tissue Tracking

    Full text link
    Deformable tracking and real-time estimation of 3D tissue motion is essential to enable automation and image guidance applications in robotically assisted surgery. Our model, Sparse Efficient Neural Depth and Deformation (SENDD), extends prior 2D tracking work to estimate flow in 3D space. SENDD introduces novel contributions of learned detection, and sparse per-point depth and 3D flow estimation, all with less than half a million parameters. SENDD does this by using graph neural networks of sparse keypoint matches to estimate both depth and 3D flow. We quantify and benchmark SENDD on a comprehensively labelled tissue dataset, and compare it to an equivalent 2D flow model. SENDD performs comparably while enabling applications that 2D flow cannot. SENDD can track points and estimate depth at 10fps on an NVIDIA RTX 4000 for 1280 tracked (query) points and its cost scales linearly with an increasing/decreasing number of points. SENDD enables multiple downstream applications that require 3D motion estimation.Comment: 12 pages, 4 figure

    DETC2009/MESA-86451 DESIGN AND RECONFIGURATION ALGORITHM OF HEXBOT: A MODULAR SELF-RECONFIGURABLE ROBOTIC SYSTEM

    No full text
    ABSTRACT This paper primarily addresses the design and implementation of a planar hexagonal Modular Self-Reconfigurable Robotic System (MSRRS) along with the construction of its reconfiguration path planner and control algorithm. A universal module is carefully designed to be in line with the common goals of MSRRS including homogeneity, cost-effectiveness, fast actuation and quick and strong connections. Although the implemented working prototype is both large and restricted to a planar geometry, it is designed such that its hardware and software can be scaled up in the number of units and down in unit size; similarly, the platform has the potential to be extended for 3D applications. The software infrastructure of this platform is designed in a way that different hierarchies for distributed control and communication can be implemented. The algorithmic design is based on a hierarchical multilayer approach, where upper layers decompose the problem into sub-problems solvable by lower layers. An optimal reconfiguration path planner is developed to minimize the number of module movements during the reconfiguration while enforcing collision avoidance and connectivity constraints in addition to taking into account the kinematic model of the platform. The core of the algorithm relies on a heuristic function and a Markov Decision Process (MDP) optimization to generate a near-optimal reconfiguration path planner and a control algorithm for HexBot shown i
    corecore